A model for cooperative scientific research inspired by the ant colony algorithm

Modern scientific research has become largely a cooperative activity in the Internet age. We build a simulation model to understand the population-level creativity based on the heuristic ant colony algorithm. Each researcher has two heuristic parameters characterizing the goodness of his own judgments and his trust on literature. We study how the distributions of contributor heuristic parameters change with the research problem scale, stage of the research problem, and computing power available. We also identify situations where path dependence and hasty research due to the pressure on productivity can significantly impede the long-term advancement of scientific research. Our work provides some preliminary understanding and guidance for the dynamical process of cooperative scientific research in various disciplines.


Introduction
Cooperative scientific research is a new trend in the science community nowadays due to the growth of number of researchers [1][2][3], the faster propagation of knowledge through the Internet [4][5][6][7] and the many new interdisciplinary research topics [8,9], etc. Research groups ranging from a few scientists to international institutions can study related problems and build upon each other's works. In the early pioneering days, the activity of scientific research was the solitary work of a few geniuses of the world and the spirits of independent thinking and skepticism were highly valued. In modern days, we are seeing more and more scientific achievements made by the progressive efforts of many researchers [10]. This paradigm shift accompanies the development of complexity science itself [8,9,11]. Scientists in the Internet age work like a highly cooperative ant colony connected by pheromone, i.e., research publications, and exhibit population-level creativity which requires modeling to understand. Recent observations on the slow-down of innovation and productivity spurred questions on how to optimize the process of innovative research in today's science communities. [12][13][14][15][16].
Previous studies on scientific research and collective intelligence have discussed various aspects of this topic including the citation system [17][18][19][20][21], evaluation and funding system [22][23][24], game theory competition and cooperation [25][26][27][28], complex networks [29,30], team size and composition management [31][32][33], and so on [34][35][36][37][38][39]. In this paper, we build a simplified a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 model inspired by the ant colony optimization (ACO) algorithm [40][41][42] to study the dynamical process of cooperative scientific research by computer simulations. Compared with previous works on the "science of science", our work offers a new perspective of viewing scientific production as running an optimization algorithm and improving ways of doing science as improving the algorithm. This viewpoint bridges knowledge from various disciplines and can help gain deeper understanding about the influencing factors of scientific production, complementing regressive models of real-world statistical data. More specifically, our ant colony model can obtain the optimal research styles for various types of scientific problems, e.g., simple (elemental) v.s. complex, new v.s. old (long-standing), etc., and study the influence of computing power and different survival rules on selecting researchers for the community.
We suppose that in the ant community, each scientific problem they study is a randomly generated traveling salesman problem (TSP) [43] with N vertices, where N controls the complexity of the problem. A researcher's effort on such a problem is modeled as making small decisions step by step to connect the vertices and find a plausible path. He will then pass on the knowledge by publications, i.e., leaving pheromone on the edges visited. The shorter the total path, the more pheromone will be assigned. Every decision made is governed by two heuristic parameters: α characterizing the researcher's trust on published literature, i.e., the pheromone left on all edges, and β characterizing his trust on the greedy local distance measure, i.e., the researcher's own sense of direction. The procedure is iterated as generations of researchers attempt for better solutions. Finally, the accumulated pheromone concentrates on the shortest TSP path found by the community, which represents the currently best answer known to the scientific problem.
Two essential ingredients of our ant colony model are the NP-hardness of TSP and the pheromone mechanism in ACO. Since TSP is NP-hard, it is easy to evaluate and compare path lengths and exclude the longer path as 'wrong', but difficult to know if the shorter path is indeed shortest, which is similar to open questions in science that satisfy the falsifiability criterion. The pheromone is a population-level information sharing mechanism that enables researchers to work out difficult scientific problems cooperatively. Our ant colony model develops the ACO in that we have improved the pheromone update rules and allow the heuristic parameters α, β to differ individually and change by evolution. We can then study the equilibrium distributions of α, β given different problem scales N and different numbers of ACO iterations that distinguish between new and old problems. The influence of computing power will be modeled by introducing the Hamiltonian cycle speedup [44] that mimics the role of computers.
We find that cooperative scientific research is a good way to tackle complex scientific problems. The effects of using computers or lab robots are also found to be positive. The main drawback factors of scientific production found in this model are path-dependence and hasty research. Our results suggest that these issues can be improved by adopting better organizations, evaluation systems and incentive policies. For example, parallel development of several independent communities can reduce path dependence and giving more credits to late contributors of long-standing problems can avoid hasty research.

The core ACO algorithm
In the ACO algorithm, each ant with two heuristic parameters α, β tries to find a TSP path individually. The ant has its own memory of the set of vertices S that has been visited and has access to the community-shared information d ij , the distance between vertices i, j and τ ij , the amount of pheromone on the undirectioned edge i − j. The ant picks a random vertex to start the trip. Then each step from vertex i to vertex j is determined by the transition probability The probabilities P i!j are normalized for all j = 2 S to determine the next stop j. Then vertex j is added to set S so it will not be repeatedly visited. Eq (1) describes the basic rules of the heuristically biased self-avoiding walk (SAW) [45,46] of ants in the original ACO algorithm. We have added a small background value of 0.01 to τ ij in Eq (1) so that the ants do not get oversensitive to small amounts of pheromone.
We have also made improvements in the pheromone update rules. After N ants = 50 ants have finished their TSP paths, we pick the best p(t) percent and allow these winning ants to leave pheromone over their TSP paths bidirectionally. The amount of pheromone on each edge is inversely proportional to the total path length and proportional to a linearly decreasing weight of the path ranking. Long-distance steps on the TSP path gets extra penalties. The pheromone on all edges then evaporates by p(t) percent and the above procedure iterates while the percentage p(t) gradually decreases from 50% to 8% (4 ants) over the iterations.
These improvements mean that initially the ant colony is very eager to accumulate pheromone and later, the update rules get tighter as the best-known path of the ant colony becomes nearly optimal. But any time, an ant who beats the best-known path always immediately becomes the leader of the top 4 ants and leaves the most pheromone to the whole ant colony. With the improved pheromone update rules, the pheromone becomes a more useful guide to the ants and better resembles the literature publication system in academia. More details of the model can be found in our Matlab code provided upon reasonable request.

Evolution of heuristic parameters
In the original ACO algorithm, the parameters α, β were set manually as hyper-parameters and applied to all ants. In our model, we allow α, β to take different values for different ants and evolve the distribution P α,β by training the ant colony with randomly generated TSP graphs. During the solution of one graph, P α,β is kept unchanged and the heuristic parameters of the ants who found shorter TSP paths than the best-known path are recorded. These ants are called contributors and their α, β values are used for evolving P α,β according to P ðnewÞ Here n c is the number of contributors recorded during one graph and M = 4000 is the total pool of ants out of which the N ants = 50 ants are sampled in each ACO iteration. When equilibrium is reached, every ant in the colony is equally likely to become a contributor. More favorable (α, β) values will attract more ants and less favorable values will be adopted by fewer ants. We then consider a more sophisticated situation where the trained distribution P α,β (t) can depend on problem stage t. To do this, we record for each contributor not only its α, β values, but also the number of ACO iterations t performed when its contribution is made. We can then compare at equilibrium the distributions P α,β (t) suitable for different problem stages t.

Hamiltonian cycle speedup
The Hamiltonian cycle speedup [44] is often used in conjunction with ACO to speed up its convergence. In the core ACO algorithm, at every vertex i, the ant heuristically picks a next vertex j based on P i!j , which mimics human intuition. The TSP path obtained is called a Hamiltonian cycle in graph theory, e.g., The Hamiltonian cycle speedup plays the role of a computer exhaustively checking human errors. It enumerates all segments i ! � � � ! j of the Hamiltonian cycle in Eq (3) and checks if the cycle length can be made shorter by reversing the segment into j ! � � � ! i, which is true if and only if The exhaustive check continues until no such improvements are possible, which is a necessary but not sufficient condition for the optimality of the TSP path. We examine the influence of the Hamiltonian cycle speedup on the distribution P α,β in Fig 2c and 2d and use the simpler model without the speedup elsewhere.

Effect of problem scale
We first examine how the equilibrium distribution P α,β changes with problem scale N, i.e., the number of vertices in the TSP graph. The vertices are randomly sampled from a uniform distribution in a 2D unit square region. We have tried other vertex distributions (Gaussian, triangular) and other region shapes (rectangle, circle) and have found qualitatively the same results. As shown in Fig 1a-1f, the α peak significantly shifts to larger values as the problem scale N increases. This indicates that when faced with more difficult problems, the research community has to rely more on previous works for guidance to find sophisticated better solutions, and random trials ignoring literature is not as efficient. The β parameter governs the researcher's goodness of local distance measure or sense of direction. For small problems, the optimal distribution relies heavily on high β values. For larger problems, the weights of high β reach a plateau and the joint distribution P α,β develops a positive correlation between α and β, suggesting that the successful research style is a combination of the α and β heuristics. Such researchers always keeps up-to-date knowledge of the currently best solution known by the community (α heuristics) and quickly identifies where potential improvements are possible (β heuristics) around the community-found path.

Time-dependent distribution
Some difficult problems can persist for years or decades as researchers come and leave. In an ideal situation, researchers switching from problems to problems specialize to both the appropriate scale of complexity and the stage of problem conducive to their own research styles (α, β values). We therefore consider P α,β (t) with t being the number of ACO iterations for fixed problem scale N = 100. We train P α,β (t) to equilibrium and plot the results in Fig 2a in 4 colors corresponding to 4 problem stages: newly proposed (t = 1-5, blue), early (t = 6-30, green), intermediate (t = 31-100, yellow), and late (t = 101-1000, red) periods.
When a problem is newly proposed, the contributors (blue) generally have high β values. Since there are not many publications to read yet, researchers with low β will move randomly between the vertices and obtain TSP paths of order OðNÞ, while those with high β will always greedily choose the closest vertex to move to. The greedy solution can be estimated to be which is much better than a random self-avoiding walk OðNÞ. Therefore, all contributors of a newly proposed problem tend to have high β values. After the greedy solution has been found, the early-stage contributors (green) constitute the upper part of the time-independent distribution P α,β in Fig 1d. The intermediate (yellow) and late-stage (red) contributors then scan down to the lower part of P α,β and finally concentrate into a red blob below the mode peak of P α,β . The error curves of the time-dependent P α,β (t) and time-independent P α,β distributions are compared in Fig 2b. We use the relative error ε(t) = L(t)/L opt − 1 averaged over 500 graphs to evaluate the goodness of a given research condition, where L opt is the optimal path length obtained from the open-source exact TSP solver Concorde [47,48]. Initially, the greedy solution of P α,β (t) (yellow dashed line in Fig 2b) has an advantage over P α,β (blue line in Fig 2b), which does not last for very long. The blue and yellow curves nearly coincide when the problem reaches intermediate to late periods. The residue error at t = 10 3 remains *1.2%, which is

PLOS ONE
likely to result from the path dependence effect [49], i.e., the ant colony gets trapped to a local minimum found by previous works. If we have two or more independent research communities (green & purple lines in Fig 2b), which is realized by running the ant colony code multiple times and keeping the smallest TSP length of the trials at every iteration step t, the relative error ε(t) has a statistically significant reduction. We use the standard deviations of the data to determine the significance of the difference between these results at t = 10 3 . Using the standard deviations of the mean values as error bars, we have the final residues for one group of ants ε 1 (t = 10 3 ) = (1.21 ± 0.045)%, two groups ε 2 (t = 10 3 ) = (0.82 ± 0.033)%, and four groups ε 4 (t = 10 3 ) = (0.60 ± 0.026)%. In terms of the t-value, between ε 1 and ε 2 t 12 = 6.99, i.e. their difference is about 7 times of standard deviation. Between ε 2 and ε 4 , t 24 = 5.24. This confirms that the three sets of scores are significantly different.

Effects of computing power
We then move on to discuss the effects of more computing power, which is mimicked by introducing the Hamiltonian cycle speedup described previously. When individual researchers

PLOS ONE
have computers that help them do exhaustive trials and verifications, our results indicate that the selectivity effects on both the literature parameter α and the intuition parameter β of the contributors are significantly reduced. As is shown in Fig 2c and 2d, both the α peak and the β plateaus are made lower by introducing the Hamiltonian cycle speedup. This means that computers are a chance equalizer which diversifies the heuristic parameter distributions of the contributors. Also, the α peak slightly shifts to smaller values, which is due to the reduction of effective problem hardness when computers become available. In terms of relative error (red line in Fig 2b), the introduction of Hamiltonian cycle speedup significantly speeds up the convergence to the optimal TSP path. The residue error at t = 10 3 iterations *0.4% is made much smaller than the blue curve but still nonzero, which suggests that the path dependence effect of ant-colony research cannot be completely eliminated even with more computing power available to each researcher individually.

Hasty research
We often see in academia that researchers are faced with tight and pressing survival rules, most of which are achievement-based. We find in our model that sometimes these rules can be counter-productive to the science community. An important reason why this happens is that such rules would encourage researchers to focus on new or early-stage problems, leaving the late-stage problems simply "outdated" rather than solved.
We simulate such a situation and results are shown in Fig 3. Suppose a problem is interesting to the ant colony for only t � 50 iterations, after which the problem becomes old and out of attention. By training the ant colony using TSP graphs with N = 100 vertices under such hasty rules, the equilibrium distribution P αβ is given by the blue dots in Fig 3a. Conversely, if every graph is solved up to t = 10 3 iterations but only those contributors after t > 50 are recorded to update P αβ , the ant colony will be trained into the distribution of the red dots. The

PLOS ONE
green contours in Fig 3a show the normally trained distribution where all contributors are recorded to update P αβ . Since achievement-based survival rules pick out those contributors with big improvements of TSP lengths, which, according to the inset of Fig 3b, tend to be early-stage contributors, the distribution P αβ shifts to the blue side as a result.
We then plot in Fig 3b the error curves ε(t) of different P αβ distributions averaged over 500 graphs. The blue distribution has short-term benefits but long-term costs. A "hasty" ant colony adapted to early-stage problems would lack those ants with heuristic parameters conducive to making breakthroughs on long-standing problems and therefore become inefficient as problems approach late stages. Comparing the blue solid line (t � 50) and the green solid line (normally trained) in Fig 3b, their final residues are ε normal = (1.18 ± 0.048)%, ε t � 50 = (1.46 ± 0.049)%, significantly different with a t-value of t = 4.1. In reality, the combined effect of making the researcher community both inefficient and not interested in solving long-standing problems could be even worse, which can be mimicked by reducing N ants = 50e −(t−50)/200 after t > 50 (blue dashed line in Fig 3b). The residue error at t = 10 3 reaches * 2%. More interestingly, the normally trained N = 100 distribution (green line in Fig 3b) can be outperformed by the red distribution (red line in Fig 3b) in the long run, which suggests the importance of giving more weights to the late contributors.

Conclusion and discussion
We have established an ant-colony research model which enables us to understand the dynamical process of cooperative scientific research in various disciplines. Based on our model, we have made several interesting findings. First, as the problem scale increases, the contributors tend to have more cooperative heuristic parameters than those of simpler problems. Therefore, the cooperative mode of scientific research is a consequence of complexity science itself. Second, different problem stages will require different research styles. In the beginning, simple intuitive thinking can help lay down the general framework. Later, improvements become harder and require deeper thinking and more trials and errors. Third, the introduction of computers or any other advanced technology (such as lab robots) that enables exhaustive trials and verifications can give the human researcher more freedom, diversify the contributor population and make the scientific results more accurate and objective.
In addition to demonstrating the power of cooperative scientific research, our model can also simulate non-ideal situations and identify how things might go wrong. First is path dependence. As scientists build upon each other's works, there is inevitably some degree of path dependence. Parallel development of several independent communities, technological methods, or schools of thoughts can be better than having one unified community stuck with preestablished ideas and paradigms. This will require the science community to be more willing to publish currently suboptimal but new ideas. Second is hasty research. Putting pressure on productivity or individual achievements can lead to hasty research. It is important to give more credits to late contributors and solvers of long-standing problems for the long-term progress of science.
More future works can be done following our ant-colony model. For example, in the current version of our model, the ant colony works out independent TSP graphs with no shared vertices or edges, which can be added to model interrelated problems in different research fields. The ant colony can be broken down into smaller communities working on different TSP paths with limited communication. More heuristic parameters can be added and the model can be combined with genetic and/or memetic algorithms to test more ways of organizing the science community.